Towards the Development of a Hybrid Parser for Natural Languages

نویسندگان

  • Sardar F. Jaf
  • Allan Ramsay
چکیده

In order to understand natural languages, we have to be able to determine the relations between words, in other words we have to be able to ‘parse’ the input text. This is a difficult task, especially for Arabic, which has a number of properties that make it particularly difficult to handle. There are two approaches to parsing natural languages: grammar-driven and data-driven. Each of these approaches poses its own set of problems, which we discuss in this paper. The goal of our work is to produce a hybrid parser, which retains the advantages of the data-driven approach but is guided by grammar rules in order to produce more accurate output. This work consists of two stages: the first stage is to develop a baseline data-driven parser, which is guided by a machine learning algorithm for establishing dependency relations between words. The second stage is to integrate grammar rules into the baseline parser. In this paper, we describe the first stage of our work, which is now implemented, and a number of experiments that have been conducted on this parser. We also discuss the result of these experiments and highlight the different factors that are affecting parsing speed and the correctness of the parser results.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Feature Engineering in Persian Dependency Parser

Dependency parser is one of the most important fundamental tools in the natural language processing, which extracts structure of sentences and determines the relations between words based on the dependency grammar. The dependency parser is proper for free order languages, such as Persian. In this paper, data-driven dependency parser has been developed with the help of phrase-structure parser fo...

متن کامل

Studying impressive parameters on the performance of Persian probabilistic context free grammar parser

In linguistics, a tree bank is a parsed text corpus that annotates syntactic or semantic sentence structure. The exploitation of tree bank data has been important ever since the first large-scale tree bank, The Penn Treebank, was published. However, although originating in computational linguistics, the value of tree bank is becoming more widely appreciated in linguistics research as a whole. F...

متن کامل

Grammars and Programming Languages: To Further Narrow the Gap

Symbolic parser/grammar combinations can be viewed as programming systems for natural language processing applications. From this perspective, they can be compared with conventional programming systems, and seen to require more effort in the important development activities of testing and debugging. This paper describes tools associated with the RH (Retro-Hybrid) parser that facilitate these ac...

متن کامل

GULiveR: GENERALIZED UNIFICATION BASED LR PARSER FOR NATURAL LANGUAGES

GULiveR (The Generalized Unification Based LR Parser) is an environment for parsing natural languages, written and developed at the Romanian Academy of Sciences, Center for Artificial Intelligence. As its name indicates, GULiveR is based on unification and its purpose is the development of applications involving natural language analysis. The environment consists of a parser, a parse table gene...

متن کامل

A Development Environment For Large-Scale Multi-Lingual Parsing Systems

We describe the development environment available to linguistic developers in our lab in writing large-scale grammars for multiple languages. The environment consists of the tools that assist writing linguistic rules and running regression testing against large corpora, both of which are indispensable for realistic development of large-scale parsing systems. We also emphasize the importance of ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2013